Advance Analytics with R (UG 21-24)
I am Ayush.
I am a researcher working at the intersection of data, law, development and economics.
I teach Data Science using R at Gokhale Institute of Politics and Economics
I am a RStudio (Posit) certified tidyverse Instructor.
I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.
Reach me
ayush.ap58@gmail.com
ayush.patel@gipe.ac.in
When there are many fancy things to try out!!
Therefore, we discuss linear models beyond least squares estimate.
Prediction Accuracy:
Model Interpretability
“This approach involves identifying a subset of the p predictors that we believe to be related to the response. We then fit a model using least squares on the reduced set of variables.”
A least square model is fit for all possible combinations of p predictors.
So, if there are 3 predictors (\(p_1,p_2,p_3\)), we fit the following models:
model 1 \(y = \beta_a + \beta_1*p_1\)
model 2 \(y = \beta_b + \beta_2*p_2\)
model 3 \(y = \beta_c + \beta_3*p_3\)
.
.
model 7 \(y = \beta_0 + \beta_e*p_1 + \beta_f*p_2 + \beta_g*p_3\)
Select the best from these
A null model is defined with no predictors. Name it \(M_0\)
For each \(k\), where \(k=1,2,3..p\), select the best model from the all \((_k^p)\) combinations. Use RSS or \(R^2\). Call it \(M_k\)
Select a single best model from \(M_0, M_1,.....,M_p\). Use prediction error on a validation set,\(C_p\), AIC, BIC, adjusted \(R^2\) or use the cross validation method.
Issues: